Parts 1-4: characterizing variability in 1 field. DUE DATE: MON APRIL 4
To begin, each student will pick a pair of data fields in (lon, t) to analyze. Sign up here.
Example Data grab: Matlab in red, IDL in blue
x = double(ncload('/Users/bem/Downloads/hw3_data.nc','olr')); y = double(ncload('/Users/bem/Downloads/hw3_data.nc','precip')); lon = double(ncload('/Users/bem/Downloads/hw3_data.nc','lon')); time = linspace(1982., 2009. +11./12, 240);
cdf2idl, '/Users/bem/Downloads/hw3_data.nc' @hw3_data.idl x = olr y = precip time = 1982. + findgen(240)/12.
NLONS = 144;
NYEARS = 20;
NMONS = 12 * NYEARS;
With data named x and y, the rest of the code can be pretty standard, but in your presentation of results page please make the plot titles say the right field names!
1. Get the data, read it into memory, and check it with a contour plot against Milan's examples.
Get used to
which dimension (first or second in the (144x240) indicated by whos or help) is space (144) and time (240)
which is called a "row" or "column" in the matrix manipulations
which array indices vary most rapidly as you step through the data in memory order
so that months 1,2,3,....13,14,... can be reformed or reshaped from 240 to 12x20
These are just language conventions, with no fundamental right or wrong way, so it is irritating to have to learn. Happily, both IDL and Matlab show (144,240) for these arrays, with longitude as dimension "1" and time as dimension "2" in functions like mean(x,1) or total(x,1). In both languages, the arrays need to be transposed or permuted to put time series (240 values) in continuous memory order, so that it can be re-interpreted as 12x20, in order to average over the 20 and make a 12-month climatology.
You don't need to turn in anything here, it is just step 1.
2. Your field 1 means and variances: space, time, spacetime
The GRAND (spacetime) mean of my x and y (including units): Matlab: mean( x(:) ) IDL: print, mean(x)
The GRAND (spacetime) variance of my x and y (including units): Matlab: var( x(:),1) IDL: print, mean(x^2)-mean(x)^2
NOTE: If your variance budgets are to add up exactly, you need them to be variance = (sum(x^2) - sum(x)^2)/N
NOT the /N-1 variance "estimator" that is standard in both IDL & Matlab var() functions.
This is the var(x, 1) in Matlab.
return, mean(x^2) - mean(x)^2 can easily be packaged into a 1-line function in IDL: var_n.pro
The GRAND standard deviations are: std(x(:)) print, sqrt(var_n(x))
The SPATIAL variance of my TIME MEAN longitude section is: _(units). var( mean(x,2) ,1) print, var_n( total(x,2)/NMONS )
The TEMPORAL variance of my LONGITUDE MEAN time series is: _(units). var( mean(x,1) ,1) print, var_n( total(x,1)/NLONS )
3. Further decomposition of spacetime variations: Mean seasonal cycle and 'climate anomaly'
Use the necessary code to make the 12 months mean seasonal cycle:
We have to rotate the time dimension to be continuous in memory (first array dimension), then reinterpret (reform or reshape) it as months x years, then average over years
Subtract that mean annual cycle, to get the traditional climate definition of "anomalies" (temporal anomalies):
Matlab: anomx = reshape(x, NLONS, 12, NYEARS); % Declare right sized array initially equal to x for iy = 1:NYEARS anomx(:,:, iy) = anomx(:,:, iy) - climx12; % subtract climo end anomx = reshape(anomx, 144, 240);
IDL: anomx = reform(x,NLONS,12,NYEARS) ; Declare right sized array initially equal to x for iy = 0, NYEARS-1 do anomx (*,*,iy) = anomx (*,*,iy) - climx12 anomx = reform(anomx, 144, 240)
3. (the assignment part)
Confirm that the 20-year mean of the anomalies as defined above is 0. Write math (on paper, for yourself) that proves it/ shows why.
Is the spatial mean of the climate anomalies (as defined above) 0? Is it the same as the time series of the spatial mean of the raw data? Or is it a new object?
My CLIMATOLOGICAL ANNUAL CYCLE has variance: _(units). var( climx12(:),1)print, var_n(climx12)
My INTERANNUAL ANOMALY ARRAYS has variance: _(units). var( anomx(:),1)print, var_n(anomx)
b) purely spatial (variance of TIME mean at each lon)
c) temporal anomalies (x minus its TIME mean at each lon)
d) purely temporal (variance of LON mean at each time)
e) spatial anomalies (x minus its LON mean at each time)
f) remove both means (space-time variability)
g) mean seasonal cycle
h) deseasonalized anomalies
i) variance of longitudinal mean of part h)
j) h minus i (anomalies from both space and monthly-climatological means)
6. Discuss your results:
Write down all the decompositions of total variance that add up sensibly, and confirm them by noticing that the numbers work out.
Example: (a) = (g) + (h). Yes it checks out, 100 = 23 + 77. or whatever numbers for your field x
Extra credit for inventing and a clear, consistent notation (extra extra credit if it can be typed in clunky Wiki editors)! The usual simple prime and overbar notation is not enough, there are multiple kinds of averages and deviations from those averages. a too-codey, not-mathy-enough example:take it or make up your own... _bart is time mean, _pert is perturbation in time, and same for space x. x = x_bart + x_pert x = x_barx + x_perx x = x_barx + (x_perx_bart + x_perx_pert) etc. Then we'd just need to work out the rules of it: For example, bar and per commute in x and t:
Some sort of labeled squiggle for "per," just like the labeled overbar we use for the average, might make a workable math notation?
4. Further decomposition of anomx by scale (using rebinning).
Rebin the data to coarser meshes, by averaging the values that fall in each coarse grid cell.
Your anomx array is 144 longitudes x 240 months.
The 144 space bins can be regridded using factors of 2 to 72, 36, 18, 9, and then (factor of 3) to 3 bins.
The 240 time bins can be regridded by factors of 2 to 120, 60, 30, 15, and then (by a factor of 3) to 5 bins.
Scale is the logarithm of size. It can be measured in decades (factors of 10) or octaves (factors of 8).
For example, 1km, 10km, 100km, 1000km are equally spaced scales (but exponentially growing sizes).
Rebinning the data averages out the smaller scale structure, and preserves the larger scale structure.
Variance as a function of bin scale measures the relative amount of structure at small vs. large scales.
Necessary code:
scalefactors = [1, 2, 4, 8, 16, 48]; % matlab or IDL are the same for these lines
IDL:
scales = alog(scalefactors) ;;; measured in factors of e, not decades or octaves variance_by_scalefactor = fltarr(6,6)
; a loop with multiple commands needs a .run ... end block if you paste into the command line: .run for it = 0, 4 do $ for ix = 0, 4 do begin print, 'rebinning space to ' +string(NLONS/scalefactors(ix))+' bins, and time to '+string(NMONS/scalefactors(it)) a = rebin(anomx, NLONS/scalefactors(ix), NMONS/scalefactors(it)) variance_by_scalefactor(ix,it) = var_n(a) endfor end
Matlab:
scales = log(scalefactors)% measured in factors of e, not decades or octaves variance_by_scalefactor = zeros(6,6); % Step 1: DECLARE the thing you want to build!
for it = 1:6 for ix = 1:6 ['rebinning space to ' num2str(ix) ' and time to ' num2str(it)] % say what's happening while it runs a = blkproc(anomx, [scalefactors(ix) scalefactors(it)], 'mean2'); variance_by_scalefactor(it,ix) = var(a(:),1) end end
calculate the gradient of var with scale and then its magnitude - this part is not well tested!
;;; The derivative of variance with scale shows where power (variance) is concentrated.
; The magnitude of the gradient is the "distribution" of variance with scale:
; it tells you at what scale the rebinning process reduced variance the most,
; which is the same as saying which scales in the data had the most variance
IDL: dvar1 = variance_by_scalefactor*0 & dvar2 = variance_by_scalefactor*0 ;;; right sized arrays for ix = 0,NS-1 do dvar1[ix,*] = deriv(variance_by_scalefactor[ix,*], scales) for it = 0,NS-1 do dvar2[*,it] = deriv(variance_by_scalefactor[*,it], scales)
variance_distrib = sqrt(dvar1^2 + dvar2^2) contour, variance_distrib, /xlog, /ylog, title='d(var)/d(scale) (var. dist): MYFIELDNAME', $ xtit='rebin factor in lon', ytit='rebin factor in t'
Based on the variance_by_scalefactor diagram you make, what space and time scales (units: degrees and months) have an especially prominent lot of variance in your anomx field? These are the scales at which averaging over them reduces variance the most. Make a contour plot of anomx, or use Milan's total plots in Getting data -- can you see these "characteristic" scales by eye?
Annotate the contour plot with some scale indications of about the right size (ovals in powerpoint may be easiest).
Assignment for Homework 3:
Parts 1-4: characterizing variability in 1 field. DUE DATE: MON APRIL 4
To begin, each student will pick a pair of data fields in (lon, t) to analyze. Sign up here.
Example Data grab: Matlab in red, IDL in blue
x = double(ncload('/Users/bem/Downloads/hw3_data.nc','olr'));y = double(ncload('/Users/bem/Downloads/hw3_data.nc','precip'));
lon = double(ncload('/Users/bem/Downloads/hw3_data.nc','lon'));
time = linspace(1982., 2009. +11./12, 240);
cdf2idl, '/Users/bem/Downloads/hw3_data.nc'
@hw3_data.idl
x = olr
y = precip
time = 1982. + findgen(240)/12.
NLONS = 144;
NYEARS = 20;
NMONS = 12 * NYEARS;
With data named x and y, the rest of the code can be pretty standard, but in your presentation of results page please make the plot titles say the right field names!
1. Get the data, read it into memory, and check it with a contour plot against Milan's examples.
Get used to
which dimension (first or second in the (144x240) indicated by whos or help) is space (144) and time (240)
which is called a "row" or "column" in the matrix manipulations
which array indices vary most rapidly as you step through the data in memory order
so that months 1,2,3,....13,14,... can be reformed or reshaped from 240 to 12x20
These are just language conventions, with no fundamental right or wrong way, so it is irritating to have to learn. Happily, both IDL and Matlab show (144,240) for these arrays, with longitude as dimension "1" and time as dimension "2" in functions like mean(x,1) or total(x,1). In both languages, the arrays need to be transposed or permuted to put time series (240 values) in continuous memory order, so that it can be re-interpreted as 12x20, in order to average over the 20 and make a 12-month climatology.
You don't need to turn in anything here, it is just step 1.
2. Your field 1 means and variances: space, time, spacetime
3. Further decomposition of spacetime variations: Mean seasonal cycle and 'climate anomaly'
Use the necessary code to make the 12 months mean seasonal cycle:
We have to rotate the time dimension to be continuous in memory (first array dimension), then reinterpret (reform or reshape) it as months x years, then average over years
Matlab:
xx = reshape( permute(x, [2 1]), 12, NYEARS, NLONS);
climxx12 = squeeze( mean(xx ,2 ));
climx12 = permute( climxx12, [2 1]);
IDL:
xx = reform( transpose(x), 12, NYEARS, NLONS)
climxx12 = total(xx, 2 ) /NYEARS
climx12 = transpose(climxx12)
Subtract that mean annual cycle, to get the traditional climate definition of "anomalies" (temporal anomalies):
Matlab:anomx = reshape(x, NLONS, 12, NYEARS); % Declare right sized array initially equal to x
for iy = 1:NYEARS
anomx(:,:, iy) = anomx(:,:, iy) - climx12; % subtract climo
end
anomx = reshape(anomx, 144, 240);
IDL:
anomx = reform(x,NLONS,12,NYEARS) ; Declare right sized array initially equal to x
for iy = 0, NYEARS-1 do anomx (*,*,iy) = anomx (*,*,iy) - climx12
anomx = reform(anomx, 144, 240)
3. (the assignment part)
Write down all the decompositions of total variance that add up sensibly, and confirm them by noticing that the numbers work out.
Example: (a) = (g) + (h). Yes it checks out, 100 = 23 + 77. or whatever numbers for your field x
Extra credit for inventing and a clear, consistent notation (extra extra credit if it can be typed in clunky Wiki editors)!
The usual simple prime and overbar notation is not enough, there are multiple kinds of averages and deviations from those averages.
a too-codey, not-mathy-enough example:take it or make up your own...
_bart is time mean, _pert is perturbation in time, and same for space x.
x = x_bart + x_pert
x = x_barx + x_perx
x = x_barx + (x_perx_bart + x_perx_pert)
etc.
Then we'd just need to work out the rules of it:
For example, bar and per commute in x and t:
Some sort of labeled squiggle for "per," just like the labeled overbar we use for the average, might make a workable math notation?
4. Further decomposition of anomx by scale (using rebinning).
Rebin the data to coarser meshes, by averaging the values that fall in each coarse grid cell.
Your anomx array is 144 longitudes x 240 months.The 144 space bins can be regridded using factors of 2 to 72, 36, 18, 9, and then (factor of 3) to 3 bins.
The 240 time bins can be regridded by factors of 2 to 120, 60, 30, 15, and then (by a factor of 3) to 5 bins.
Scale is the logarithm of size. It can be measured in decades (factors of 10) or octaves (factors of 8).
For example, 1km, 10km, 100km, 1000km are equally spaced scales (but exponentially growing sizes).
Rebinning the data averages out the smaller scale structure, and preserves the larger scale structure.
Variance as a function of bin scale measures the relative amount of structure at small vs. large scales.
Necessary code:
scalefactors = [1, 2, 4, 8, 16, 48]; % matlab or IDL are the same for these lines
IDL:
scales = alog(scalefactors) ;;; measured in factors of e, not decades or octavesvariance_by_scalefactor = fltarr(6,6)
; a loop with multiple commands needs a .run ... end block if you paste into the command line:.run
for it = 0, 4 do $
for ix = 0, 4 do begin
print, 'rebinning space to ' +string(NLONS/scalefactors(ix))+' bins, and time to '+string(NMONS/scalefactors(it))
a = rebin(anomx, NLONS/scalefactors(ix), NMONS/scalefactors(it))
variance_by_scalefactor(ix,it) = var_n(a)
endfor
end
Matlab:
scales = log(scalefactors) % measured in factors of e, not decades or octavesvariance_by_scalefactor = zeros(6,6); % Step 1: DECLARE the thing you want to build!
for it = 1:6
for ix = 1:6
['rebinning space to ' num2str(ix) ' and time to ' num2str(it)] % say what's happening while it runs
a = blkproc(anomx, [scalefactors(ix) scalefactors(it)], 'mean2');
variance_by_scalefactor(it,ix) = var(a(:),1)
end
end
calculate the gradient of var with scale and then its magnitude - this part is not well tested!
;;; The derivative of variance with scale shows where power (variance) is concentrated.; The magnitude of the gradient is the "distribution" of variance with scale:
; it tells you at what scale the rebinning process reduced variance the most,
; which is the same as saying which scales in the data had the most variance
IDL:
dvar1 = variance_by_scalefactor*0 & dvar2 = variance_by_scalefactor*0 ;;; right sized arrays
for ix = 0,NS-1 do dvar1[ix,*] = deriv(variance_by_scalefactor[ix,*], scales)
for it = 0,NS-1 do dvar2[*,it] = deriv(variance_by_scalefactor[*,it], scales)
variance_distrib = sqrt(dvar1^2 + dvar2^2)
contour, variance_distrib, /xlog, /ylog, title='d(var)/d(scale) (var. dist): MYFIELDNAME', $
xtit='rebin factor in lon', ytit='rebin factor in t'
Matlab:
[dvar1, dvar2] = gradient(variance_by_scalefactor, scales, scales);variance_distrib = sqrt(dvar1 .^2 + dvar2 .^2)
4. decomposition by scale (the assignment)